Selective Provenance for Datalog Programs Using Top-K Queries
نویسندگان
چکیده
Highly expressive declarative languages, such as datalog, are now commonly used to model the operational logic of dataintensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance, and here we focus on a detailed form of provenance (howprovenance), defining it as the set of derivation trees of a given fact. While informative, the size of such full provenance information is typically too large and complex (even when compactly represented) to allow displaying it to the user. To this end, we propose a novel top-k query language for querying datalog provenance, supporting selection criteria based on tree patterns and ranking based on the rules and database facts used in derivation. We propose an efficient novel algorithm based on (1) instrumenting the datalog program so that, upon evaluation, it generates only relevant provenance, and (2) efficient top-k (relevant) provenance generation, combined with bottom-up datalog evaluation. The algorithm computes in polynomial data complexity a compact representation of the top-k trees which may be explicitly constructed in linear time with respect to their size. We further experimentally study the algorithm performance, showing its scalability even for complex datalog programs where full provenance tracking is infeasible.
منابع مشابه
Circuits for Datalog Provenance
The annotation of the results of database queries with provenance information has many applications. This paper studies provenance for datalog queries. We start by considering provenance representation by (positive) Boolean expressions, as pioneered in the theories of incomplete and probabilistic databases. We show that even for linear datalog programs the representation of provenance using Boo...
متن کاملEfficiently Computing Provenance Graphs for Queries with Negation
Explaining why an answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. Both types of questions, i.e., why and why-not provenance, have been studied extensively. In this work, we present the first practical approach for answering such questions fo...
متن کاملA PROV Encoding for Provenance Analysis Using Deductive Rules
PROV is a specification, promoted by the World Wide Web consortium, for recording the provenance of web resources. It includes a schema, consistency constraints and inference rules on the schema, and a language for recording provenance facts. In this paper we describe a implementation of PROV that is based on the DLV Datalog engine. We argue that the deductive databases paradigm, which underpin...
متن کاملCombined Tractability of Query Evaluation via Tree Automata and Cycluits (Extended Version)
We investigate parameterizations of both database instances and queries that make query evaluation fixed-parameter tractable in combined complexity. We introduce a new Datalog fragment with stratified negation, intensional-clique-guarded Datalog (ICG-Datalog), with linear-time evaluation on structures of bounded treewidth for programs of bounded rule size. Such programs capture in particular co...
متن کاملImplementing Unified Why- and Why-Not Provenance Through Games
Using provenance to explain why a query returns a result or why a result is missing has been studied extensively. However, the two types of questions have been approached independently of each other. We present an efficient technique for answering both types of questions for Datalog queries based on a game-theoretic model of provenance called provenance games. Our approach compiles provenance r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 8 شماره
صفحات -
تاریخ انتشار 2015